theoretical contribution
Higher-Order Certification For Randomized Smoothing
Randomized smoothing is a recently proposed defense against adversarial attacks that has achieved state-of-the-art provable robustness against $\ell_2$ perturbations. A number of works have extended the guarantees to other metrics, such as $\ell_1$ or $\ell_\infty$, by using different smoothing measures. Although the current framework has been shown to yield near-optimal $\ell_p$ radii, the total safety region certified by the current framework can be arbitrarily small compared to the optimal. In this work, we propose a framework to improve the certified safety region for these smoothed classifiers without changing the underlying smoothing scheme. The theoretical contributions are as follows: 1) We generalize the certification for randomized smoothing by reformulating certified radius calculation as a nested optimization problem over a class of functions.
positive scores (7 7 7 6) and that all the reviewers appreciated the paper for the following: (i) theoretical contributions
We thank all the reviewers for their time and effort in providing feedback. For clarity, we would like to reiterate the goal and motivation of the paper. We address the individual concerns below. We thank R3 for pointing out the typo. Thus, the approximated network achieved 97.17% test set accuracy with On the other hand, one of our networks resulting from edge-popup achieved a 97.53% test set accuracy by retaining We would again like to thank the reviewers for the positive reviews.
on both our theoretical contributions showing an equivalence between a notion of training speed and the Bayesian
We thank the reviewers for their helpful feedback. We now address some concerns. We have replicated the DNN experiments (S4.2) We can derive this result using Jensen's'I was not able to ascertain how the result of Theorem 2 is used in the text, I'd be happy if the authors could clarify.' 'I found that the transition to the neural networks remains a bit confusing.' 'how much the results support the marginal likelihood-based model selection hypothesis, or whether they should more
Export Reviews, Discussions, Author Feedback and Meta-Reviews
First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. This paper derives policy gradient algorithms for risk-sensitive MDPs for the particular criterion CVaR - a recent and popular criterion. First, the author derive gradients for the objective based on a Lagrangian relaxation of the constrained optimization. This naturally turns into a policy gradient algorithm where the expected return that appears in the gradient is estimated from full trajectories (reinforce-like). They then propose a scheme to obtain incremental actor-critic versions, where the critic computes the value (and other quantities) of an augmented MDP convenient for gradient estimation.
Review for NeurIPS paper: Untangling tradeoffs between recurrence and self-attention in artificial neural networks
Additional Feedback: - Line 145, how can Theorem 1 be related to the early attention mechanism [1]? As the attention weights are computed adaptively, it is unlikely that they are uniform. MANNs learn to store relevant hidden states to a fixed-size memory, which seems to have the same purpose as relevancy screening mechanism. What is the advantage of the proposed method over MANNs? How are MANNs related to the Theorem 2? - The paper neglects prior works that also aim to quantify gradient propagation in RNNs and attentive models [4,5].
Reviews: Integrating Bayesian and Discriminative Sparse Kernel Machines for Multi-class Active Learning
Originality: The combination of sampling in areas of'greater interest' while adjusting to the underlying distribution appears in many active learning works, but the objective in (1) is novel and approaching both in a unified framework is challenging. The lower bounding of the optimization problem is also new Quality: The experimental results are very thorough and show the improvement of the proposed method over random sampling as well as several other baselines. And the exploration of effect of tuning parameters and initial sample size is excellent. However the theoretical contributions appear incomplete. The significant theoretical contribution is the (mislabelled) Theorem 2, and both the statement and proof of this is extremely informal.
Review for NeurIPS paper: Joint Contrastive Learning with Infinite Possibilities
Additional Feedback: I think it is too strong to claim that "we also theoretically unveil the certain important mechanisms that govern the behavior of JCL." The main theoretical tool in the proposed method is an application of Jensen's inequality. There is also a section (3.3) that discusses some very basic properties of the the objective. To claim any of this as a significant "theoretical contribution" is too strong in my view. To me, the most interesting aspect of Fig2 is part (b).
Reviews: Implicitly learning to reason in first-order logic
This paper is generally well written and clear, albeit targeting readers with formal backgrounds. The quality of the paper seems high in terms of its formal claims. The proposed mechanism is remarkable simple, making this an attractive approach. I really like the idea behind not making learning explicit (as opposed to rule induction for example). I have three main concerns about this paper: - In general it is very close to Juba's 2012 work [1].
Higher-Order Certification For Randomized Smoothing
Randomized smoothing is a recently proposed defense against adversarial attacks that has achieved state-of-the-art provable robustness against \ell_2 perturbations. A number of works have extended the guarantees to other metrics, such as \ell_1 or \ell_\infty, by using different smoothing measures. Although the current framework has been shown to yield near-optimal \ell_p radii, the total safety region certified by the current framework can be arbitrarily small compared to the optimal. In this work, we propose a framework to improve the certified safety region for these smoothed classifiers without changing the underlying smoothing scheme. The theoretical contributions are as follows: 1) We generalize the certification for randomized smoothing by reformulating certified radius calculation as a nested optimization problem over a class of functions.
Reviews: Generalization Properties of Learning with Random Features
This is in my opinion an excellent paper, a significant theoretical contribution to understanding the role of the well established random feature trick in kernel methods. The authors prove that for a wide range of optimization tasks in machine learning random feature based methods provide algorithms giving results competitive (in terms of accuracy) to standard kernel methods with only \sqrt{n} random features (instead of linear number; this provides scalability). This is according to my knowledge, one of the first result where it is rigorously proven that for downstream applications (such as kernel ridge regression) one can use random feature based kernel methods with relatively small number of random features (the whole point of using the random feature approach is to use significantly fewer random features than the dimensionality of a data). So far most guarantees were of point-wise flavor (there are several papers giving upper bounds on the number of random features needed to approximate the value of the kernel accurately for a given pair of feature vectors x and y but it is not clear at all how these guarantees translate for instance to risk guarantees for downstream applications). The authors however miss one paper with very relevant results that it would be worth to compare with theirs.